A Stochastic Approach to Median String Computation

نویسندگان

  • Cristian Olivares-Rodríguez
  • José Oncina
چکیده

Due to its robustness to outliers, many Pattern Recognition algorithms use the median as a representative of a set of points. A special case arises in Syntactical Pattern Recognition when the points (prototypes) are represented by strings. However, when the edit distance is used, finding the median becomes a NP-Hard problem. Then, either the search is restricted to strings in the data (set-median) or some heuristic approach is applied. In this work we use the (conditional) stochastic edit distance instead of the plain edit distance. It is not yet known if in this case the problem is also NP-Hard so an approximation algorithm is described. The algorithm is based on the extension of the string structure to multistrings (strings of stochastic vectors where each element represents the probability of each symbol) to allow the use of the Expectation Maximization technique. We carry out some experiments over a chromosomes corpus to check the efficiency of the algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stochastic Models for Pricing Weather Derivatives using Constant Risk Premium

‎Pricing weather derivatives is becoming increasingly useful‎, ‎especially in developing economies‎. ‎We describe a statistical model based approach for pricing  weather derivatives by modeling and forecasting daily average temperatures data which exhibits long-range dependence‎. ‎We pre-process the temperature data by filtering for seasonality and volatility an...

متن کامل

Parallel computation framework for optimizing trailer routes in bulk transportation

We consider a rich tanker trailer routing problem with stochastic transit times for chemicals and liquid bulk orders. A typical route of the tanker trailer comprises of sourcing a cleaned and prepped trailer from a pre-wash location, pickup and delivery of chemical orders, cleaning the tanker trailer at a post-wash location after order delivery and prepping for the next order. Unlike traditiona...

متن کامل

Incorporating Wind Power Generation And Demand Response into Security-Constrained Unit Commitment

Wind generation with an uncertain nature poses many challenges in grid integration and secure operation of power system. One of these operation problems is the unit commitment. Demand Response (DR) can be defined as the changes in electric usage by end-use customers from their normal consumption patterns in response to the changes in the price of electricity over time. Further, DR can be also d...

متن کامل

A New Model to Speculate CLV Based on Markov Chain Model

The present study attempts to establish a new framework to speculate customer lifetime value by a stochastic approach. In this research the customer lifetime value is considered as combination of customer’s present and future value. At first step of our desired model, it is essential to define customer groups based on their behavior similarities, and in second step a mechanism to count current ...

متن کامل

Learning state machine-based string edit kernels

During the past few years, several works have been done to derive string kernels from probability distributions. For instance, the Fisher kernel uses a generative model M (e.g. a hidden markov model) and compares two strings according to how they are generated by M . On the other hand, the marginalized kernels allow the computation of the joint similarity between two instances by summing condit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008